50 research outputs found

    Exploiting different levels of parallelism in the biological sequence comparison problem

    Get PDF
    In the last years the fast growth of bioinformatics field has atracted the attention of computer scientists. At the same time, de exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work, we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism. As a case of analysis, we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm (SW), which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the application makes it ideal for architectures supporting multiple dimensions of parallelism (thread-level parallelism, TLP; data-level parallelism, DLP; instruction-level parallelism, ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, IBM Cell BE and MareNostrum machines. Our study includes a qualitative analysis of the parallelization opportunities and also the quantification of the performance in terms of speedup and execution time. These measures are collected taking into account the specific characteristics of each architecture. As an example, our results show that a share memory multiprocessor architecture (SMP) like the PowerPC 970MP of Marenostrum machine can surpasses a heterogeneous multi- processor machine like the current IBM Cell BE.Peer ReviewedPostprint (published version

    Performance analysis of sequence alignment applications

    Get PDF
    Advances in molecular biology have led to a continued growth in the biological information generated by the scientific community. Additionally, this area has become a multi-disciplinary field, including components of mathematics, biology, chemistry, and computer science, generating several challenges in the scientific community from different points of view. For this reason, bioinformatic applications represent an increasingly important workload. However, even though the importance of this field is clear, common bioinformatic applications and their implications on micro-architectural design have not received enough attention from the computer architecture community. This paper presents a micro-architecture performance analysis of recognized bioinformatic applications for the comparison and alignment of biological sequences, including BLAST, FASTA and some recognized parallel implementations of the Smith-Waterman algorithm that use the Altivec SIMD extension to speed-up the performance. We adopt a simulation-based methodology to perform a detailed workload characterization. We analyze architectural and micro-architectural aspects like pipeline configurations, issue widths, functional unit mixes, memory hierarchy and their implications on the performance behavior. We have found that the memory subsystem is the component with more impact in the performance of the BLAST heuristic, the branch predictor is responsible for the major performance loss for FASTA and SSEARCH34, and long dependency chains are the limiting factor in the SIMD implementations of Smith-Waterman.Peer ReviewedPostprint (published version

    The SARC architecture

    Get PDF
    The SARC architecture is composed of multiple processor types and a set of user-managed direct memory access (DMA) engines that let the runtime scheduler overlap data transfer and computation. The runtime system automatically allocates tasks on the heterogeneous cores and schedules the data transfers through the DMA engines. SARC's programming model supports various highly parallel applications, with matching support from specialized accelerator processors.Postprint (published version

    Parallel processing in biological sequence comparison using general purpose processors

    Get PDF
    The comparison and alignment of DNA and protein sequences are important tasks in molecular biology and bioinformatics. One of the most well known algorithms to perform the string-matching operation present in these tasks is the Smith-Waterman algorithm (SW). However, it is a computation intensive algorithm, and many researchers have developed heuristic strategies to avoid using it, specially when using large databases to perform the search. There are several efficient implementations of the SW algorithm on general purpose processors. These implementations try to extract data-level parallelism taking advantage of single-instruction multiple-data extensions (SIMD), capable of performing several operations in parallel on a set of data. In this paper, we propose a more efficient data parallel implementation of the SW algorithm. Our proposed implementation obtains a 30% reduction in the execution time relative to the previous best data-parallel alternative. In this paper we review different alternative implementation of the SW algorithm, compare them with our proposal, and present preliminary results for some heuristic implementations. Finally, we present a detailed study of the computational complexity of the different alignment algorithms presented and their behavior on the different aspect of the CPU microarchitecture.Peer ReviewedPostprint (published version

    On the scalability of 1- and 2-dimensional SIMD extensions for multimedia applications

    Get PDF
    SIMD extensions are the most common technique used in current processors for multimedia computing. In order to obtain more performance for emerging applications SIMD extensions need to be scaled. In this paper we perform a scalability analysis of SIMD extensions for multimedia applications. Scaling a 1-dimensional extension, like Intel MMX, was compared to scaling a 2-dimensional (matrix) extension. Evaluations have demonstrated that the 2-d architecture is able to use more parallel hardware than the 1-d extension. Speed-ups over a 2-way superscalar processor with MMX-like extension go up to 4X for kernels and up to 3.3X for complete applications and the matrix architecture can deliver, in some cases, more performance with simpler processor configurations. The experiments also show that the scaled matrix architecture is reaching the limits of the DLP available in the internal loops of common multimedia kernels.Peer ReviewedPostprint (published version

    Strict Selection Alone of Patients Undergoing Liver Transplantation for Hilar Cholangiocarcinoma is Associated with Improved Survival

    Get PDF
    Liver transplantation for hilar cholangiocarcinoma (hCCA) has regained attention since the Mayo Clinic reported their favorable results with the use of a neo-adjuvant chemoradiation protocol. However, debate remains whether the success of the protocol should be attributed to the neo-adjuvant therapy or to the strict selection criteria that are being applied. The aim of this study was to investigate the value of patient selection alone on the outcome of liver transplantation for hCCA. In this retrospective study, patients that were transplanted for hCCA between 1990 and 2010 in Europe were identified using the European Liver Transplant Registry (ELTR). Twenty-one centers reported 173 patients (69%) of a total of 249 patients in the ELTR. Twenty-six patients were wrongly coded, resulting in a study group of 147 patients. We identified 28 patients (19%) who met the strict selection criteria of the Mayo Clinic protocol, but had not undergone neo-adjuvant chemoradiation therapy. Five-year survival in this subgroup was 59%, which is comparable to patients with pretreatment pathological confirmed hCCA that were transplanted after completion of the chemoradiation protocol at the Mayo Clinic. In conclusion, although the results should be cautiously interpreted, this study suggests that with strict selection alone, improved survival after transplantation can be achieved, approaching the Mayo Clinic experience

    The impact of non-additive genetic associations on age-related complex diseases

    Get PDF
    Genome-wide association studies (GWAS) are not fully comprehensive, as current strategies typically test only the additive model, exclude the X chromosome, and use only one reference panel for genotype imputation. We implement an extensive GWAS strategy, GUIDANCE, which improves genotype imputation by using multiple reference panels and includes the analysis of the X chromosome and non-additive models to test for association. We apply this methodology to 62,281 subjects across 22 age-related diseases and identify 94 genome-wide associated loci, including 26 previously unreported. Moreover, we observe that 27.7% of the 94 loci are missed if we use standard imputation strategies with a single reference panel, such as HRC, and only test the additive model. Among the new findings, we identify three novel low-frequency recessive variants with odds ratios larger than 4, which need at least a three-fold larger sample size to be detected under the additive model. This study highlights the benefits of applying innovative strategies to better uncover the genetic architecture of complex diseases. Most genome-wide association studies assume an additive model, exclude the X chromosome, and use one reference panel. Here, the authors implement a strategy including non-additive models and find that the number of loci for age-related traits increases as compared to the additive model alone.Peer reviewe

    Reduced diversity and increased virulence-gene carriage in intestinal enterobacteria of coeliac children

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Coeliac disease is an immune-mediated enteropathology triggered by the ingestion of cereal gluten proteins. This disorder is associated with imbalances in the composition of the gut microbiota that could be involved in its pathogenesis. The aim of the present study was to determine whether intestinal <it>Enterobacteriaceae </it>populations of active and non-active coeliac patients and healthy children differ in diversity and virulence-gene carriage, so as to establish a possible link between the pathogenic potential of enterobacteria and the disease.</p> <p>Methods</p> <p><it>Enterobacteriaceae </it>clones were isolated on VRBD agar from faecal samples of 31 subjects (10 active coeliac patients, 10 symptom-free coeliac patients and 11 healthy controls) and identified at species level by the API 20E system. <it>Escherichia coli </it>clones were classified into four phylogenetic groups A, B1, B2 and D and the prevalence of eight virulence-associated genes (type-1 fimbriae [<it>fimA</it>], P fimbriae [<it>papC</it>], S fimbriae [<it>sfaD/E</it>], Dr haemagglutinin [<it>draA</it>], haemolysin [<it>hlyA</it>], capsule K1 [<it>neuB</it>], capsule K5 [<it>KfiC</it>] and aerobactin [<it>iutA</it>]) was determined by multiplex PCR.</p> <p>Results</p> <p>A total of 155 <it>Enterobacteriaceae </it>clones were isolated. Non-<it>E. coli </it>clones were more commonly isolated in healthy children than in coeliac patients. The four phylogenetic <it>E. coli </it>groups were equally distributed in healthy children, while in both coeliac patients most commensal isolates belonged to group A. Within the virulent groups, B2 was the most prevalent in active coeliac disease children, while D was the most prevalent in non-active coeliac patients. <it>E coli </it>clones of the virulent phylogenetic groups (B2+D) from active and non-active coeliac patients carried a higher number of virulence genes than those from healthy individuals. Prevalence of P fimbriae (<it>papC</it>), capsule K5 (<it>sfaD/E</it>) and haemolysin (<it>hlyA</it>) genes was higher in <it>E. coli </it>isolated from active and non-active coeliac children than in those from control subjects.</p> <p>Conclusion</p> <p>This study has demonstrated that virulence features of the enteric microbiota are linked to coeliac disease.</p

    ReLA, a local alignment search tool for the identification of distal and proximal gene regulatory regions and their conserved transcription factor binding sites

    Get PDF
    Motivation: The prediction and annotation of the genomic regions involved in gene expression has been largely explored. Most of the energy has been devoted to the development of approaches that detect transcription start sites, leaving the identification of regulatory regions and their functional transcription factor binding sites (TFBSs) largely unexplored and with important quantitative and qualitative methodological gaps
    corecore